Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-7061] sr/store: Deep copy schema_def iobufs at the interface #23114

Merged

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Aug 28, 2024

sharded_store's main purpose is to supply a consistent interface to an underlying (sharded) in-memory store. So to process a given request, it must first work out the home shard for the desired schema ID or subject, dispatching requests to that shard.

Recently, schema registry was refactored to store schema definitions as iobufs (rather than contiguous strings) to avoid large allocations. To avoid semantically unnecessary copies, get_schema_definition was implemented to return a iobuf share across a shard boundary. This is an unsafe operation, as the reference counting and deletion machinery of the underlying temporary buffers is totally unsynchronized. This is UB in principle (data race) and leads to memory access faults (UAF, etc.) in practice.

So this commit changes 'store::get_schema_definition' to return a copy of the stored schema definition (iobuf), eliminating any cross-shard sharing of the underlying temp buffers. This approach is rather blunt and carries some cost at deallocation time (seastar allocator must work out the appropriate site for memory reclamation for each component temp buffer of a given schema def), but it should be safe since the component temp buffers are never live on multiple shards simultaneously.

Note that this path could be optimized somewhat by returning a shared iobuf managed by foreign_ptr and carrying out the necessary copy (a read-only operation) on the requesting shard.

This commit also includes a "unit test" that is more of a memory access canary, of sorts. Reverting the change in 'store::gets_schema_definition' will cause the test to fail reliably, producing memory faults in line with the reasoning above. With the change in place, this test reliably runs to completion without issue.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Bug Fixes

  • Fixes a regression in schema registry where concurrent requests could lead to memory faults

sharded_store's main purpose is to supply a consistent interface
to an underlying (sharded) in-memory store. So to process a given
request, it must first work out the home shard for the desired
schema ID or subject, dispatching requests to that shard.

Recently, schema registry was refactored to store schema definitions
as iobufs (rather than contiguous strings) to avoid large allocations.
To avoid semantically unnecessary _copies_, get_schema_definition was
implemented to return a iobuf share across a shard boundary. This is
an unsafe operation, as the reference counting and deletion machinery
of the underlying temporary buffers is totally unsynchronized. This is
UB in principle (data race) and leads to memory access faults (UAF, etc.)
in practice.

So this commit changes 'store::get_schema_definition' to return a copy
of the stored schema definition (iobuf), eliminating any cross-shard
sharing of the underlying temp buffers. This approach is rather blunt
and carries some cost at deallocation time (seastar allocator must
work out the appropriate site for memory reclamation for each component
temp buffer of a given schema def), but it should be safe since the
component temp buffers are never live on multiple shards simultaneously.

Note that this path could be optimized somewhat by returning a shared
iobuf managed by foreign_ptr and carrying out the necessary copy (a
read-only operation) on the requesting shard.

This commit also includes a "unit test" that is more of a memory access
canary, of sorts. Reverting the change in 'store::gets_schema_definition'
will cause the test to fail reliably, producing memory faults in line with
the reasoning above. With the change in place, this test reliably runs to
completion without issue.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman requested a review from BenPope August 28, 2024 18:57
@oleiman oleiman self-assigned this Aug 28, 2024
@oleiman oleiman requested a review from ballard26 August 28, 2024 18:58
@oleiman
Copy link
Member Author

oleiman commented Aug 28, 2024

@ballard26 - talked this over with @BenPope a bit, and we came to the conclusion that returning an iobuf copy across the shard boundary should be safe since it's temporaries all the way out to the call site and no data is shared.

It's a pessimization, and I assume we could do something w/ foreign_ptr to push the copy operation out to the call site. But it doesn't seem strictly necessary here since the path isn't particularly hot. WDYT? I get the impression that the seastar allocator should do the right thing when it tears down the copy, but I don't exactly know.

@ballard26
Copy link
Contributor

@ballard26 - talked this over with @BenPope a bit, and we came to the conclusion that returning an iobuf copy across the shard boundary should be safe since it's temporaries all the way out to the call site and no data is shared.

It's a pessimization, and I assume we could do something w/ foreign_ptr to push the copy operation out to the call site. But it doesn't seem strictly necessary here since the path isn't particularly hot. WDYT? I get the impression that the seastar allocator should do the right thing when it tears down the copy, but I don't exactly know.

I agree that using a foreign_ptr isn't needed here. It's done in the fetch path as an optimization to defer the copy since we don't always include the read batches in the fetch reply. And in those cases deferring it allows the fetch to avoid the copy altogether.

The seastar allocator will also be able to recognize that the memory being free'd doesn't belong to the current shard and will return it back to the proper shard so no worries there.

The only real concern is whether two shards can hold references to one of the underlying temporary_buffers in parallel. Since I think that's the cause for most if not all the race conditions. As you said since the copied iobuf is never shared on the original shard it should be safe to move it to the foreign shard and destroy it there.

One gotcha may be that in debug mode we restrict certain iobuf operations to the shard it was originally allocated on. So if a dev tries to use any of those operations on the copied iobuf they may find debug-mode tests failing. That isn't a correctness issue though and easily discovered.

Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

Comment on lines +35 to +36
// Upsert a large(ish) number of schemas to the store, all with different
// subject names and IDs, so they should hash to different shards.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same schema should end up with the same ID, so the comment is a little confusing.

Copy link
Member Author

@oleiman oleiman Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In real operation, yeah, but I thought that disambiguation happened somewhere else, like on the command apply path? it looks like when I directly upsert, sharded store is happy to shove an (id, def) pair into the schema map in memory on shard hash(id) as requested. I might be missing/misreading something.

@michael-redpanda michael-redpanda merged commit 02cb28b into redpanda-data:dev Aug 29, 2024
20 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

jaitul25

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants